首页> 外文OA文献 >A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming
【2h】

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

机译:一种马尔可夫状态集在线构造方法   使用答案集编程的决策过程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Non-stationary domains, that change in unpredicted ways, are a challenge foragents searching for optimal policies in sequential decision-making problems.This paper presents a combination of Markov Decision Processes (MDP) withAnswer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), whichis a method capable of constructing the set of domain states while the agentinteracts with a changing environment. oASP(MDP) updates previously obtainedpolicies, learnt by means of Reinforcement Learning (RL), using rules thatrepresent the domain changes observed by the agent. These rules represent a setof domain constraints that are processed as ASP programs reducing the searchspace. Results show that oASP(MDP) is capable of finding solutions for problemsin non-stationary domains without interfering with the action-value functionapproximation process.
机译:非平稳域的变化方式无法预测,这对于代理商在顺序决策问题中寻找最优策略是一个挑战。本文提出了马尔可夫决策过程(MDP)与答案集编程(ASP)的结合,名为{\ em Online用于MDP的ASP}(oASP(MDP)),这是一种能够在代理与不断变化的环境交互时构造域状态集的方法。 oASP(MDP)使用代表代理观察到的域更改的规则更新通过强化学习(RL)学习的先前获得的策略。这些规则表示一组域约束,这些域约束作为ASP程序进行处理以减少搜索空间。结果表明,oASP(MDP)能够在非平稳域中找到问题的解决方案,而不会干扰动作值函数的逼近过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号